54 research outputs found

    Acoustic model adaptation for ortolan bunting (Emberiza hortulana L.) song-type classification

    Get PDF
    Automatic systems for vocalization classification often require fairly large amounts of data on which to train models. However, animal vocalization data collection and transcription is a difficult and time-consuming task, so that it is expensive to create large data sets. One natural solution to this problem is the use of acoustic adaptation methods. Such methods, common in human speech recognition systems, create initial models trained on speaker independent data, then use small amounts of adaptation data to build individual-specific models. Since, as in human speech, individual vocal variability is a significant source of variation in bioacoustic data, acoustic model adaptation is naturally suited to classification in this domain as well. To demonstrate and evaluate the effectiveness of this approach, this paper presents the application of maximum likelihood linear regression adaptation to ortolan bunting (Emberiza hortulana L.) song-type classification. Classification accuracies for the adapted system are computed as a function of the amount of adaptation data and compared to caller-independent and caller-dependent systems. The experimental results indicate that given the same amount of data, supervised adaptation significantly outperforms both caller-independent and caller-dependent systems

    Stress and Emotion Classification Using Jitter and Shimmer Features

    Get PDF
    In this paper, we evaluate the use of appended jitter and shimmer speech features for the classification of human speaking styles and of animal vocalization arousal levels. Jitter and shimmer features are extracted from the fundamental frequency contour and added to baseline spectral features, specifically Mel-frequency cepstral coefficients (MFCCs) for human speech and Greenwood function cepstral coefficients (GFCCs) for animal vocalizations. Hidden Markov models (HMMs) with Gaussian mixture models (GMMs) state distributions are used for classification. The appended jitter and shimmer features result in an increase in classification accuracy for several illustrative datasets, including the SUSAS dataset for human speaking styles as well as vocalizations labeled by arousal level for African elephant and Rhesus monkey species

    Perceptually Motivated Wavelet Packet Transform for Bioacoustic Signal Enhancement

    Get PDF
    A significant and often unavoidable problem in bioacoustic signal processing is the presence of background noise due to an adverse recording environment. This paper proposes a new bioacoustic signal enhancement technique which can be used on a wide range of species. The technique is based on a perceptually scaled wavelet packet decomposition using a species-specific Greenwood scale function. Spectral estimation techniques, similar to those used for human speech enhancement, are used for estimation of clean signal wavelet coefficients under an additive noise model. The new approach is compared to several other techniques, including basic bandpass filtering as well as classical speech enhancement methods such as spectral subtraction, Wiener filtering, and Ephraim–Malah filtering. Vocalizations recorded from several species are used for evaluation, including the ortolan bunting (Emberiza hortulana), rhesus monkey (Macaca mulatta), and humpback whale (Megaptera novaeanglia), with both additive white Gaussian noise and environment recording noise added across a range of signal-to-noise ratios (SNRs). Results, measured by both SNR and segmental SNR of the enhanced wave forms, indicate that the proposed method outperforms other approaches for a wide range of noise conditions

    Acoustic Model Adaptation for Automatic Speech Recognition and Animal Vocalization Classification

    No full text
    As I am finalizing my dissertation, I am considering the two most valuable experiences in my doctoral process. The first is that of identifying a specific research direction for my Ph.D. study, and the second is that of figuring out how to accomplish it. Generally, a perfect topic would allow a student to accomplish his or her program in less time with better quality, but it is very difficult to find this right direction at the early stages of study. The research topics I had been working on originally spanned many different areas in both human speech technologies and bioacoustics, including acoustic enhancement for improving audio quality, acoustic feature extraction at the front-end of recognition systems, and looking at the Lombard effect for investigating the auditory system. Eventually, I settled on acoustic model adaptation as my dissertation topic. I have gained much research experience and knowledge from all these areas. Before I decided my research direction, thorough research in this direction was critical. This taught me what other researchers have done in this area, and which part of this direction was still open. However, one more practical point I often ignored was how those people implemented their metl1ods in terms of experimental work and software programming, so I did not focus until almost the last year in my Ph.D. life on whether I could realistically implement the same experiments as what the people did in their works. I finally realized that it was nothing to be proud of to just understand complicated algorithms, such as the expectation maximization (EM), because the derivation of statistical equations is a fundamental skill to a Ph.D. candidate in electrical engineering. Having an earlier consciousness of programming implementation would have given me a better understanding for the time and effort I needed for this research direction, and help me make a wise decision as to both theory and practice. This Ph.D. work was funded by Dr. Dolittle project, which focuses on development of a broad framework for pattern analysis and classification of animal vocalizations by integrating successful models and ideas from the field of speech processing and recognition into bioacoustics (Johnson et al., 2003). Therefore my work naturally consists both of a theoretical aspect for human speech and a practical aspect for bioacoustic application. Although the field of bioacoustics is challenging due to its multidisciplinary nature, speech technology is the original foundation. I am hopeful that my Ph.D. research will benefit both fields

    Acoustic model and adaptation for automatic speech recognition and animal vocalization classification

    No full text
    Automatic speech recognition (ASR) converts human speech to readable text. Acoustic model adaptation, also called speaker adaptation, is one of the most promising techniques in ASR for improving recognition accuracy. Adaptation works by tuning a general purpose acoustic model to a specific one according to the person who is using it. Speaker adaptation can be categorized by Bayesian-based, transformation-based and model combination-based methods. Model combination-based speaker adaptation has been shown to have an advantage over the traditional Bayesian-based and transformation-based adaptation methods when the amount of adaptation speech is as small as a few seconds. However, model combination-based rapid speaker adaptation has not been widely used in practical applications since it requires large amounts of speaker-dependent (SD) training data from multiple speakers. This research proposes a new technique, eigen-clustering , to eliminate the need for large quantities of speaker-labeled training utterances so that model combination-based adaptation can be started from much more inexpensive speaker-independent (SI) data. Based on principal component analysis (PCA), this technique constructs an eigenspace using each utterance in the training set. This proposed adaptation method can not only improve human speech recognition directly, but also contribute to animal vocalization analysis and behavior studies potentially. Application to the field of bioacoustics is especially meaningful because the amount of collected animal vocalization data is often limited and therefore fast adaptation methods are naturally suitable

    Big data based fraud risk management at Alibaba

    Get PDF
    With development of mobile internet and finance, fraud risk comes in all shapes and sizes. This paper is to introduce the Fraud Risk Management at Alibaba under big data. Alibaba has built a fraud risk monitoring and management system based on real-time big data processing and intelligent risk models. It captures fraud signals directly from huge amount data of user behaviors and network, analyzes them in real-time using machine learning, and accurately predicts the bad users and transactions. To extend the fraud risk prevention ability to external customers, Alibaba also built up a big data based fraud prevention product called AntBuckler. AntBuckler aims to identify and prevent all flavors of malicious behaviors with flexibility and intelligence for online merchants and banks. By combining large amount data of Alibaba and customers', AntBuckler uses the RAIN score engine to quantify risk levels of users or transactions for fraud prevention. It also has a user-friendly visualization UI with risk scores, top reasons and fraud connections
    • …
    corecore